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QUANTIZATION AND CENTROIDAL VORONOI TESSELLATIONS FOR 
PROBABILITY MEASURES ON DYADIC CANTOR SETS 

MRINAL KANTI ROYCHOWDHURY 


Abstract. Quantization of a probability distribution is the process of estimating a given prob¬ 
ability by a discrete probability that assumes only a finite number of levels in its support. 
Centroidal Voronoi tessellations (CVT) are Voronoi tessellations of a region such that the gen¬ 
erating points of the tessellations are also the centroids of the corresponding Voronoi regions. 
In this paper, we investigate the optimal quantization and the centroidal Voronoi tessellations 
with n generators for a Borel probability measure P on R supported by a dyadic Cantor set 
generated by two self-similar mappings with similarity ratios r, where 0 < r < . 


1. Introduction 


In the context of communication theory, quantization is the process by which data is reduced 
to a simpler, more coarse representation which is more compatible with digital processing. 
Loosely speaking, quantization is the heart of analog to digital conversion. It is an area which 
has increased in importance in the last few decades due to the burgeoning advances in digital 
technology. Quantization for probability distributions refers to the idea of estimating a given 
probability measure by a discrete probability measure with hnite support. We refer to |GG 


IGN|, |Z] for surveys on the subject and comprehensive lists of references to the literature, see 
also lAWllDRlIGKLl^ . For mathematical treatment of quantization one is referred to Graf- 
Luschgy’s book (see inn]). Let denote the d-dimensional Euclidean space, || ■ || denote the 
Euclidean norm on for any d > 1, and n G U. Then, the nth quantization error for a Borel 
probability measure P on is dehned by 


( 1 ) 


K := K(P) = inf 


mm 

aEo: 


\x — a|pdP(x) : a C R'^, card(Q;) < n 


where the inhmum is taken over all subsets a of R^ with card(a) < n. If J ||a:|pdP(x) < cxo, then 
there is some set a for which the inhmum is achieved (see |GKLl[GLllGLlj L Such a set a for 
which the inhmum occurs and contains no more than n points is called an optimal set ofn-means, 
or optimal set of n-quantizers. If a is a hnite set, in general, the error f minagQ — a 
is often referred to as the cost or distortion error for a, and is denoted by V (P; a) 

Vn ■= Ki(-P) = inf{U(P;a) : a C R'^, card(a) < n}. It is known that for a continuous 
probability measure P an optimal set of n-means always has exactly n elements (see ra). 
Given a hnite subset a C R*^, the Voronoi region generated by a G a is dehned by 


2dP(x) 

Thus, 


M{a\a) = {x G 


X — a = mm 
h^a. 


|x- &||} 


i.e., the Voronoi region generated by a G a is the set of all elements in R"^ which are closer to a 
than to any other element in a, and the set {M(a|a) : a G a} is called the Voronoi diagram or 
Voronoi tessellation of R'^ with respect to a. A Borel measurable partition {Aa : a G a} of R^ 
is called a Voronoi partition of R"^ with respect to a (and P) if P-almost surely Aa C M{a\a) 
for every a & a. Notice that if a = {ai,a 2 , • • • ,a„} is an optimal set of n-means for P and 
{Ai, A 2 , • • • , An} is a Voronoi partition with respect to a, then I 4 = Ia- 11^ “ ai|pdP(x). 
Centroidal Voronoi tessellations (GVTs) are Voronoi tessellations of a region such that the 
generating points of the tessellations are also the centroids of the corresponding Voronoi regions. 
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A CVT with n generators, also called a CVT with n-means, associated with a probability 
measure P is called an optimal centroidal Voronoi tessellation (OCVT) if the n generators form 
an optimal set of n-means for P. Let us now state the following proposition (see [GGlIGLl] ). 

Proposition 1.1. Let a be an optimal set of n-means, a & a, and M{a\a) be the Voronoi 
region generated by a G a, i.e., M{a\a) = {x G : ||x — a|| = miubgo ||x — 6||}. Then, for 
every a ^ a, (i) P{M{a\a)) > 0, (ii) P{dM{a\a)) = 0, (in) a = E{X : X G M(a|Q;)), and (iv) 
F-almost surely the set {M(a|Q;) : a G a} forms a Voronoi partition of 

Remark 1.2. Let a be an optimal set of n-means and a ^ a, then by Proposition ll.il we have 

« =_^_ f = 

P(M(a\a)) dP ' 

which implies that a is the centroid of the Voronoi region M(a|Q!) associated with the probability 
measure P (see also [DFGj ). Thus, we can say that for a Borel probability measure P on M'^, 
an optimal set of n-means forms a centroidal Voronoi tessellation of however, the converse 
is not true in general (see |DFG1[GG] ). 

Let Si, S 2 : M —)■ M be two contractive similarity mappings such that 5'i(x) = rx and S' 2 (x) = 
rx + (1 — r) for 0 < r < Then, there exists a unique Borel probability measure P on M such 
that P = o + Ip o where P o S~^ denotes the image measure of P with respect to 
Pj for f = 1, 2 (see [H]). Such a P has support the Gantor set generated by the two mappings Pi 
and P 2 . In this paper, in Section 3, we have given a centroidal Voronoi tessellation (GVT) for the 
probability measure P supported by the Gantor set generated by Pi(x) = |x and S 2 {x) = ^x+^. 
The formula in this paper can be used to obtain a GVT for P on any Gantor set generated by 
Pi(x) = rx and P 2 (x) = rx + (l —r), where 0.4364590141 < r < 0.4512271429 (written up to ten 
decimal places). For the classical Gantor set C, i.e., when r = |, in the paper jGL2j . Graf and 
Luschgy determined the optimal sets of n-means for the probability measure P for all n > 2. 
For a long time it was believed that using the same formula given in |GL2] . one could determine 
the optimal sets of n-means for all n > 2 for the probability measure P supported by any Gantor 
set generated by the two mappings Pi(x) = rx and P 2 (x) = rx-f (1 — r) for 0 < r < , i.e., if 

0 < r < 0.4384471872. In Proposition 14.31 we have shown that it is not always true, by showing 
that if 0.4371985206 < r < 0.4384471872 and n is not of the form 2^^”^ for any positive integer 
£{n), then the distortion error of the GVT obtained using the formula in this paper is smaller 
than the distortion error of the GVT obtained using the formula given by Graf-Luschgy in [GL2j . 
In fact, in Section 5, we have further improved this bound which is given in Remark 15.31 In 
addition, the work in this paper shows that under squared error distortion measure, the centroid 
condition is not sufficient for optimal quantization for a singular continuous probability measure. 
Recall that the centroid condition is not sufficient for optimal quantization for an absolutely 
continuous probability measure is already known (see |DFG] and |GG1 Ghapter 6 ]). 

2. Preliminaries 

By a string or a word a over an alphabet {1,2}, we mean a hnite sequence a := aia 2 - ■ ■ Ck 
of symbols from the alphabet, where k > 1, and k is called the length of the word a. A word of 
length zero is called the empty word, and is denoted by 0. By {1,2}*, we denote the set of all 
words over the alphabet {1,2} of some hnite length k including the empty word 0. By |cr|, we 
denote the length of a word a G {1, 2}*. For any two words a := criCT 2 ■ ■ ■ (7k and r := rir 2 ■ ■ - Tg 
in {1,2}*, by err := cri ■ ■ ■ (TkT\ • • - r^, we mean the word obtained from the concatenation of 
the two words a and r. Let Pi and S 2 be two contractive similarity mappings on M given by 
Pi(x) = |x and S 2 {x) = |x -f |. For a := cricr 2 ■ ■ ■ (Jk £ {1, 2}^, set := o • • • o and 
Jo- := Po-([0,1]). For the empty word 0, by P 0 , it is meant the identity mapping on M, and write 
J := J 0 = P 0 ([O, 1 ]) = [0,1]. Then, the set C := flfceNUae{i, 2 }fc Jo is known as the Cantor set 
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generated by the two mappings Si and S' 2 , and equals the support of the probability measure 
P given by P = |P o 5'f ^ + ^P o For any a G {1, 2}*, the intervals and J „2 into which 
Jfj is split up are called the basic intervals of J^. For a = aia 2 ■ ■ ■ (Jk G {1,2}*, fc > 0, write 
:= ^ and := (|)^. 

Let X be a random variable with probability distribution P. By E{X) and V := V{X), 
we mean the expectation and the variance of the random variable X. For words ,6 in 

(1, 2}*, by a{/3, 7 , • • • , h), we mean the conditional expectation of the random variable X given 
J 13 U J-yU ■ ■ ■ U Js, i.e., 

(2) a(/3, 7 , • • • , h) = E{X\X e J^UJ^U---UJs) = ^ 

P[Jfj U ■ ■ - L) Js) JjyyU-UJs 

Let us now give the following lemmas. 


Lemma 2.1. Let / : M —M’*' be Borel measurable and k Then 



E ^ff°s,dp. 

ae{l, 2 p 


Proof. We know P = |P 0 5;^^ + |P o S '2 and so by induction P = X]cre{i 2}'= ° 

thus the lemma is yielded. □ 


Lemma 2.2. E{X) = ^ and V := V{X) = and for any xq G M, J~ Xo')^dP(x) = 

v(x) + (xo-ir. 


Proof. We have E{X) = f xdP(x) = ^ f ^xdP(x) + | /(|a; + ^)dP(x) = ^ E(X) + ^ E(X) + 
^ = I E(X) + which implies E(X) = 


E{X^) = j x^dP{x) = ^j x^dP o i(x) + 


x^dP o S'2 


x) 


1 

2 


16 


x‘^dP{x) + ^ f a; + dP{x) = ^ E{X^) + ^ E{X) + 


25 

1 ^ 


81 ^^ 162 ^ 162 ’ 


which implies E{X^) = and hence V{X) = E{X-E{X)f = E{X'^)-{E{X)f = E-{hf = 
E, Then, following the standard theory of probability, we have f (x — xo)^dP(x) = V(X) + 
(xq — E(X))^, and thus the lemma is yielded. □ 


Corollary 2.3. Let a G {1,2}*. Then, for any xq G M, 

(3) f {x - XoYdP{x) = p^(^slV + - xo^y 

J J(j 

Note 2.4. Notice that from the above lemma it follows that the optimal set of one-mean is 
the expected value and the corresponding quantization error is the variance V of the random 
variable X. For a G {1,2}^, A; > 1, since a(a) = E{X : X G Ja), using Lemma [2.11 we have 

a{a) = ^ [ xdP{x) = [ xdPo Sf^{x) = [ S^{x) dP{x) = E{S^{X)). 

E[aaj Jj„ Jj„ J 

Since S'! and S '2 are similarity mappings, it is easy to see that E{Sj{X)) = Sj{E{X)) for j = 1, 2 
and so by induction, a(cr) = E{S„{X)) = So-(P(X)) = So-(|) for a G {1,2}^, k > 1. 

In the next section. Proposition 13.31 Proposition 13.111 and Proposition 13.121 determine the 
centroidal Voronoi tessellations with n generators for the probability measure P for all n > 2. 
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3. Centroidal Voronoi tessellations for all n > 2 

In this section, we determine the CVTs with n-means for each n > 2 of the Cantor set C 
generated by the two mappings 5*1 and S 2 defined by S'i(a;) = and 52 (x) = |x +1 for x G M. 
As the probability distribution P has support the Cantor set C and C G J, a. CVT of J with 
respect to the probability distribution is also a CVT of C and vice versa. Once we know a 
CVT, using the formula ([3]), the corresponding distortion error can easily be obtained. Write 
:= {a(all, al21, al221), a(al222, a21), a(a22)}, or 
Aa- := {a((7ll), a((7l2, cr2111), a((T2112, (t 212, (722)}. If a is the empty word 0, then we have 
A:=A 0 = {a(ll, 121,1221), a(1222, 21), a(22)}, or 
A-=A^ = {a(ll),a(12,2111),a(2112,212,22)}. 

Let us now prove the following lemma. 

Lemma 3.1. Similarity mappings preserve the ratio of the distances of a point from any other 
two points. 

Proof. Let a,h^c E M. For any a G {1, 2}^ it is enough to prove that , which 

is clearly true, since 

\Sa{b) - 5^(a)| _ Sa\b - a\ _ \b - a\ 

|5^(c) - 5^(6)| s^\c-b\ \c-b\' 

□ 

Remark 3.2. The two mappings Si and S 2 defined in this paper are increasing mappings, i.e., 
for any x,y E M., x < y implies Si{x) < Si{y) for i = 1,2, and so by Lemma [3T] if a < b < c, we 
have 

(c - b){S^{b) - S^{a)) = {b- a){S^{c) - S^{b)). 

Proposition 3.3. Let n E N and n = 2*^ for some k E N. Then, an = {5o-(|) : cr E {1,2}^} 
forms a unique optimal CVT with n-means with distortion error W = }|) V. 

Proof. By Remark 13.21 for any a G {1, 2}^ we have 

(1 - - s,(0)) = (1 - 0)(M1) - 

which implies 5o-(i) = i(5o-(0) -|- 5o-(l)), he., 5o-(|) are the midpoints of the basic intervals 
for all a E {1,2}^. In addition, by Remark [L2] and Note 12.41 Sa-{^) is the centroid of J^. Thus, 
the set {5o-(|) : a E {1,2}*'} forms a CVT of the Cantor set. Moreover, by Corollary 12.31 for 
a G M, (x — a^dP is minimum when a = 5o-(|). Hence, the set forms a unique optimal 
CVT of the Cantor set, and then 

min ||x — a||^(iP = [ min(x — a)^(iP = Pas'^V = (- 

a&an j J a&a ^ \ 9 

(Te{l,2}'' o-e{l,2}'= ^ 

This completes the proof of the proposition. □ 

Lemma 3.4. Let A = {a(ll, 121,1221), a(1222, 21), a(22)} or 

A = {a(ll), a(12, 2111), a(2112, 212, 22)}. Then, A forms a CVT with three-means of the Cantor 
set C. 

Proof. We have 

5i22i( 1) = 0.395671 < ^(a(ll, 121,1221) + a(1222, 21)) = 0.400854 < 5i222(0) = 0.405426, 

52i( 1) = 0.753086 < ^(a(1222, 21) + a(22)) = 0.754839 < 522 ( 0 ) = 0.802469. 

Thus, A = {a(ll, 121,1221), a(1222, 21), a(22)} forms a CVT of C. Due to symmetry A = 
{a(ll), a(12, 2111), a(2112, 212, 22)} also forms a CVT of C. □ 
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Lemma 3.5. The set 1121,11221), a(11222,121), a(122), a(21), a(22)} forms a CVT 

with five-means. 

Proof We have 5f^({a(lll, 1121,11221), a(11222,121), a(122)}) 

= {a(ll,121,1221),a(1222,21),a(22)}. So, by Lemma[331 the set 

S'f^({a(lll, 1121,11221), a(11222,121), a(122)}) forms a CVT with three-means. Similarly, by 
Proposition 13.31 the set S'^^({a(21), a(22)}) = {a(l),a(2)} forms a CVT with two-means. By 
Lemma 13.11 we know that the similarity mappings preserve the ratio of the distances of a point 
from any other two points, and so the set {a(lll, 1121,11221), a(11222,121), a(122)} forms a 
CVT with three-means of Ji and the set {a(21), a(22)} forms a CVT with two-means of J 2 - 
Thns, the nnion of the CVTs of Ji and J 2 will form a CVT of the Cantor set C if we can prove 
that 

5i 22(1) < ^(a(122) + a(21)) < ^21(0), 

which is clearly trne since 

,Si 22(1) = 0.444444 < ^(a(122) + a(21)) = 0.527435 < ^ 21 ( 0 ) = 0.555556. 

Hence the given set forms a CVT with hve-means. □ 

Remark 3.6. Similarly, we can prove that the sets 
{a(lll), a(112,12111), a(12112,1212,122), a(21), a(22)}, 

{a(ll), a(12), a(211, 2121, 21221), a(21222, 221), a(222)}, 

{a(ll), a(12), a(211), a(212, 22111), a(22112, 2212, 222)} also form CVTs with hve-means, i.e., 
the nnmber of CVTs with hve-means is fonr. 

Lemma 3.7. The set {a(lll, 1121,11221), a(11222,121), a(122), a(211, 2121,21221), 
a(21222, 221), a(222)} forms a CVT with six-means. 

Proof. The set {a(ll, 121,1221), a(1222, 21), a(22)} forms a CVT of J with three-means. So, by 
Lemma [3Tl the sets S'i({a(ll, 121,1221), a(1222, 21), a(22)}) and 

S' 2 ({a(ll, 121,1221), a(1222, 21), a(22)}) form CVTs of Ji and J 2 , respectively. Thns, the given 
set will form a CVT with six-means if we can prove that 

.Si22(l) < ^(a(122) + a(211, 2121, 21221)) < ^ 211 ( 0 ), 

which is clearly trne since 

5i 22(1) = 0.444444 < ^(a(122) + a(211, 2121, 21221)) = 0.521 < 52ii(0) = 0.555556. 

Thns, the lemma is yielded. □ 

Remark 3.8. By Lemma [3.41 since there are two diherent CVTs of J with three-means, one 
can say that each of the basic intervals Ji and J 2 has two diherent CVTs, and thus using all 
possible combinations one can see that the total number of CVTs with six-means is four. 

Lemma 3.9. The set {a(lll, 1121,11221), a(11222,121), a(122), a(211), a(212), a(221), a(222)} 
forms a CVT with seven-means. 

Proof. The set {a(ll, 121,1221), a(1222, 21), a(22)} forms a CVT of J with three-means. So, by 
Lemma [3.11 the set {a(lll, 1121,11221), a(11222,121), a(122)} forms a CVT of Ji with three- 
means. Again by Proposition 13.31 the set {a(211), a(212), a(221), a(222)} forms a CVT of J 2 
with four-means. Hence, the union of the two CVTs will form a CVT with seven-means if we 
can prove that 

^122(1) < ^(a( 122 ) + a( 211 )) < ^211(0), 
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which is clearly true since 

^^ 22 ( 1 ) = 0.444444 < ^(a(122) + a(211)) = 0.5 < ^ 211 ( 0 ) = 0.555556. 

Thus, the lemma is obtained. □ 

Remark 3.10. Each Ji for z = 1, 2, has two different CVTs, and so using all possible combina¬ 
tions we see that the total number of CVTs with seven-means is four. 

Let us now prove the following two propositions. 

Proposition 3.11. Let n G N be such that n = 2^^”^ -|- k, where 1 < k < Let 

/ C { 1 , with card(/) = k = n — 2^("'b Then, the set := Q;ri(/), where 

f U ^. 2 (i) : ^ e {1, \ /} if 1 < A; < 2'W-i, 

Ak = ^ 1 , 

[ 740 if fc = 2 ^*^”^“^ = 1, i.e., when n = 3, 

forms a CVT with n-means. The number of CVTs for 1 < A: < is 2 ”“^^*'"' x ^C„_ 2 nn), 

and the number of CVTs for k = jg \ 

Proof. Let us first assume that n = + k, where 1 < A: < Let / C {1,2}^^”^“^ 

with card(/) = n — By Lemma [331 for o^ich a ^ I, the set forms a CVT of 

J with three-means, and so by Lemma 13.11 the set forms a CVT of with three-means. 
Now, proceeding in the similar way as Lemma 13.71 we can say that the set Uo-e/^o- forms a 
CVT of Uo-e/</fT- Again, for each a G {1,2}^*^’^^“^ \ I the set 5'“^({5'o-i(|), S'o- 2 (|)}) forms a 
CVT of J with two-means, and so by Lemma l3T] the set {5'o-i(|), S'o- 2 (|)} forms a CVT of 
Jcr with two-means. Now, proceeding in the similar way as Lemma 13.51 we can say that the 
set Uo-e/Ao- U |<5'o-i(|), < 5 'o- 2 (|) : <7 G {l, 2 }^(A-i y /| forms a CVT with n-means. Notice that 

card(/) = n — 2 ^(”) and / can be chosen in ^C'„_ 2 n») ways. For each a E I there are two 
different choices for A^. Hence, the number of CVTs in this case is 2"^“^^^”^ x ^0^-2^^- 
Similarly, by Lemma 13.11 Lemma 13.41 and proceeding in the similar way as Lemma 13.71 we can 
prove that if n = 2 ^^) _|_ where k = 2 ^A)-i ^ 1 or A: = 2 ^A)-i = the set Ug-g/Ao- forms a 
CVT of C, and the number of CVTs in either case is given by 2^^^"^ \ Hence, the proposition 
is yielded. □ 

Proposition 3.12. Let n G N be such that n = 3 • 2^^")“^ -f k, where 1 < A: < 2 ^A)-i _ x_ Let 
J C { 1 , with card(/) = n — 3 ■ Then, the set := an{I), where 

«„(/) = (U^g{i 2 p(»)-i\ 7 A^) U : cr G / and r G {1,2}2|, 

forms a CVT with n-means. The number of such sets is 2 ^^^"^^^“” x ^C'„_ 3 . 2 nn)-i. 

Proof. Let n = 3 ■ 2 ^(A-i _|_ where 1 < A; < 2 ^A)-i _ x_ Let I C {l, 2 }^A)-i with card(/) = 
^ _ 3 . 2 d^)-i. For each a G {1, 2 }^A)-i \ I we have S'-^A^) = A, which is a CVT of C with 
three-means, and so the set Ag- forms a CVT of Jg- for each a G {1, 2}^^"^“^ \ /. Again, the set 
{Jcrr : r G {1, 2}^} forms a CVT with four-means of for each a E L Thus, proceeding in the 

similar way as Lemma l3Al we can prove that the set an{I) = (Uggji 2 }'^(")-i\/Ag-) U |5'g-r^|j : 
a E I and r G {1,2}^| forms a CVT of C with n-means. Notice that / can be chosen in 

2 ^") 2 ^(„)_i ways and card({l, \I) = 2^^”)“^ — (n — 3 ■ 2 ^A)-i) = 2 ^(”'Ai _ Por 

each a G {l, 2 }^A)-i y / there are two different choices for Ag-. Hence, the number of CVTs in 
this case is x ^C'„_ 3 . 2 n»)-i- Thus, the proposition is yielded. □ 

Let us now prove the following lemma. 














Quantization and centroidal Voronoi tessellations for probability measures on dyadic Cantor sets 


7 


Lemma 3.13. Let P = |Po + \P o he the probability measure supported by the Cantor 
set generated by Si{x) = rx and S 2 {x) = rx+ (1 — r). Let n ^ N, n > 2 and n is not of the form 
fQf. Qfiy g Then, an{I), given by Proposition 13. ill or Proposition \3.1^A for each 
n>2 forms a CVT i/0.4364590141 < r < 0.4512271429 (written up to ten decimal places). 

Proof. Let O-si^I) = {a(ll, 121,1221), a(1222, 21), a(22)}. It forms a CVT if 
^i22i(l) < ^(a(ll, 121,1221) + a(1222, 21)) < <5 ’i222(0) 

and ^ 21 ( 1 ) < ^(a(1222, 21) + a(22)) < ^ 22 ( 0 ), 

i.e., if 0.4364590141 < r < 0.4521904271 and 0.2076973455 < r < 0.4512271429, which yields 
0.4364590141 < r < 0.4512271429. Similarly, if a 3 (/) = {a(ll), a(12, 2111), a(2112, 212, 22)}, 
it will form a CVT if 0.4364590141 < r < 0.4512271429. By Lemma [3Tl we can say that the 
set an{I) for each n > 2 also forms a CVT if 0.4364590141 < r < 0.4512271429, and thns the 
lemma is yielded. 

□ 

Remark 3.14. Proposition 13.31 Proposition 13.111 and Proposition 13.121 give the CVTs with n- 
means for the probability distribntion P snpported by the Cantor set generated by the mappings 
S'i(x) = and S 2 {x) = + | for any positive integer n>2. Lemma 13. 131 savs that using the 

formula given in this paper, if n is not of the form 2^^”^ for any £{n) G N, one can determine 
the CVTs with n-means for any n > 2, and hence the corresponding distortion error, for the 
probability measure P supported by any Cantor set generated by Si{x) = rx and S 2 {x) = 
rx + (1 — r), where 0.4364590141 <r< 0.4512271429. 

4. Distortion errors for two different CVTs 


In this section we compare the distortion errors for two different CVTs with n-means: one is 
obtained using the formula given by Proposition 13.3[ Proposition 13.111 or Proposition 13.121 in 
this paper, and one is obtained using the formula given in |GL2j . Let P = o + |P o 
be the probability measure supported by the Cantor set generated by Si{x) = rx and S 2 {x) = 
rx+ (1—r). Then, it can be shown that if V is the variance of a random variable with distribution 
P in this case, then V = ./~V . 

Definition 4.1. For n G N with n > 2 let £{n) be the unigue natural number with 2^*^"'^ < n < 
For I C {1,2}^*^’^) with card{I) = n — let (3n{I) be the set consisting of all midpoints 
Oo- of intervals with a G {1, 2}^*^”^ \ / and all midpoints a^i, 0^2 of the basic intervals of 
with a E L Formally, 

(3n{I) = {a^ : a E {1, 2}^^”^ \ U {a^i : a E 1} U { 0^2 : cr G /}. 

In [GL2] . it was shown that /9„(J) forms an optimal set of n-means for r = |. Let us now 
prove the following lemma. 


Lemma 4.2. Let /3n{I) be the set given by Definition \4.1\ Then, /?„,(/) forms a CVT with n- 
means for each n>2 ifO < r < , i.e., if Q < r < 0.4384471872 (written up to ten decimal 

places). 


Proof. Let an be the midpoint of Jn, ai 2 be the midpoint of J 12 , and 02 be the midpoint of 
J 2 . Then, /33({1}) = { 011 , 012 , 02 }, and it will form a CVT if 5'i2(l) < \{ai 2 + 02 ) < <S'2(0), 
which implies r < ^ (—-f r -|- 2) < 1 — r, which after simplification yields 0 < r < i.e., 

0 < r < 0.4384471872. Thus, by Lemma 13711 it can be seen that if 0 < r < 0.4384471872, /?„(/) 
for each n > 2 also forms a CVT, and thus the lemma is yielded. □ 

Let us now prove the following proposition. 
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Proposition 4.3. Let «„(/) be the set as defined by Proposition 13.31 Proposition 13.111 or 
Proposition 13.121 and /5„(/) be the set given by Definition 14.1[ Snppose n is not of the form 
2 ^D) for any positive integer i{n). Then, V{P,an{I)) < V{P,l3n{I)) if 0.4371985206 < r < 
0.4384471872, where V{P,an{I)) and V{P,(3n{I)) respectively denote the distortion errors for 
the sets «„(/) and 

Proof. If n is of the form 2^^"i for some positive integer £{n), then it is easy to see that 
V{P,an{I)) = V{P,l3n{I)). Let us assume that n is not of the form for any positive 
integer i{n). To prove V{P,an{I)) < V{P, j3n{I)), due to Lemma [3Tl it is enough to prove the 
inequality for n = 3, then it will be satisfied for all other values n = 5,6, 7, 9,10, etc., which are 
not of the form Notice that in a 3 (/) as defined in Proposition 13.111 the set / is an empty 
set. By Lemmatake a 3 (/) = {a(ll, 121,1221), a(1222, 21), a(22)}. Then, using ([T]) and ([3]), 
we have 


V{P,a3{I)) 

= V{P, {a(ll, 121,1221), a(1222, 21), a(22)}) 
= / (a;-a(ll,121,1221))2dP(a;) + 


{x - a{1222, 21) fdP{x) 


' J11UJ121UJ1221 


' J1222UJ21 


+ / (x — a{22)YdP{x) 

J J 22 

{x - a{ll,12l,l221))‘^dP{x) + / (x - a(ll, 121,1221))2dP(x) 


' Jll 


' J 12 I 


+ / (x-a(ll,121,1221))2dP(x) + / (x - a(1222, 21))2dP(x) 


J 1221 


J1222 


+ / (x-a(1222,21))2(iP(x) + / {x - a{22))‘^dP{x), 


J 2 I 


' J 22 


which implies 

V{P, as{I)) = ^ (r^V + (a(ll) - a(ll, 121, 1221))^) + 1 (rV + (a(121) - a(ll, 121,1221))2) 
+ 4 + (a(1221) - a(ll, 121,1221))2) + ^ (rV + (a(1222) - a(1222, 21))2) 


24 

+ 4 + («(21) - «(1222, 21))2) + IrV. 


22 V ' V V / V ^ JJ J ' 2"^ 

Now, use ([2]), and simplify to obtain 

—3r® — 3r® + 14r'^ — 22r® — 71r® + 49r4 + 4r^ + 88r^ — 84r + 28 

=-560(7TT)-■ 

To calculate V{P, I3^{I)) we take I = {1}, then /33(/) = {a(ll), a(12), a(2)}. Thus, 

V{P,/33{I))= [ {x - a{ll))^dP + f {x - a{12))^dP + [ {x - a{2))^dP =-r^V + 

'J 12 Jj 2 2 2 


' -^11 


We see that V{P,a 3 {I)) < V{P,j33{I)) if 0.4371985206 < r. Combining this with the values of 
r in Lemma 147^ we see that V{P,a 3 {I)) < V{P,/33{I)) if 0.4371985206 < r < 0.4384471872, 
which yields the proposition. □ 


Remark 4.4. Proposition 14.31 savs that if 0.4371985206 < r < 0.4384471872 and n is not of the 
form 2^^”'^ for any positive integer i{n), then the distortion error for the CVT an{I) obtained 
in this paper is less than the distortion error for the CVT obtained using the formula in |GL2] . 
But, until now it is not known whether this «„(/) forms an optimal CVT with n-means for 
0.4371985206 < r < 0.4384471872. In the following section, in Theorem 15.21 we give an answer 
of it. 
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5. The CVT «„(/) does not form an optimal CVT 

In this section, we show that if n is not of the form for any positive integer i{n), then 
the CVT an{I) does not form an optimal CVT for 0.4371985206 < r < ~ 0.4384471872. 

Write := {a((Tll, al211, al2121), a(al2122, al22, cr211), a(a212, (t 22)}, or 
Ca := {a((Tll, (t 121), a(crl22, (t 211, cr21211), a((T21212, (t 2122, cr22)}. If a is the empty word 0, 
then we have C ■.= = {a(ll, 1211 , 12121 ), a( 12122 , 122 , 211 ), a( 212 , 22 )}, or 

C := C 0 = {a(ll, 121),a(122,211,21211),a(21212,2122,22)}. Let neN. lin = + where 

1 < k < then write 

f U |5.i(l), ,S. 2 (|) : a e {1, 2}d-)-i \ /} if 1 < fc < 2d-)-i, 

ifk = 2d^)-i ^ 1, 

[ Q if A: = 2d^)-i = 1, i.e., when n = 3, 

where I C { 1 , 2 }^^”)“^ with card(/) = k = n—2^^‘^\ If n = 3-2^("^)“^+fc, where 1 < A: < —1, 

then write 

Snil) = (U^e{i, 2 p( 0 -i\rC'<^) U : 0 - e / and r e { 1 , 2 } 2 |, 

where I C { 1 , 2 }^(’^)“^ with card(/) = k = n — 3 ■ 2 ^C)-i_ 

We now prove the following proposition. 

Proposition 5.1. Let 0.4364590141 < r < 0.4486234903. Then, both dn{I) and an{I) form 
CVTs. Moreover, if n is not of the form 2^*^"^^ for any positive integer £{n), then V{P,Sn{I)) < 
V{P,an{I)) if 0.4364590141 < r < 0.4486234903, where V{P,Sn{I)) and V{P,an{I)) respec¬ 
tively denote the distortion errors for the CVTs Sn{£) and an{I). 

Proof. Let us hrst hnd the values of r for which h„(/) forms a CVT. Proceeding in the similar 
way as Lemma 13.131 we see that 5n{I) forms a CVT if 0.4332840530 < r < 0.4486234903. 
Moreover, by Lemma 13.131 an{k) forms a CVT if 0.4364590141 < r < 0.4512271429. Thus, 
both 5n{I) and an{I) forms a CVT if 0.4364590141 < r < 0.4486234903. Now, to find the 
values of r for which V{P,Sn{k)) < V{P,an{I)), we proceed in the similar way as the proof of 
Proposition Ol Take 63 ( 1 ) = {a(ll, 1211,12121), a(12122,122, 211), a(212, 22)}. Then, using 
([T]) and ([3]), we have 

v{p,m) 

= ^ (rV + (a(ll) - a(ll, 1211 , 12121 )) 2 ) + 1 (rV + (a( 1211 ) - a(ll, 1211 , 12121 )) 2 ) 

+ ^ (r^V + (a(12121) - a(ll, 1211,12121))2) + ^ (r^V + (a(12122) - a(12122,122, 211))2) 
+ ^ (rV + (a(122) - a(12122,122, 211))2) + ^ (rV + (a(211) - a(12122,122, 211))^) 

+ ^ (rV + (a( 212 ) - a( 212 , 22 ))^) + 1 (rV + (a( 22 ) - a( 212 , 22 ))^) . 

Then, using (|2]), 

V{P,63{I)) 

5rii + 5ri° - 2 r^ + 18r® + 89r^ + 21 r® + 180r5 - 48r^ - IdOr^ - 568r2 + 660r - 220 
“ 3168(r+ 1) ■ 

Equation (jH) gives Vs{P, as^I)). Thus, we see that V(P, 53 ( 1 )) < V{P,a 3 {I)) if 0.4307442489 < 
r. Combining this with the values of r for which both h 3 (/) and 03 ( 1 ) simultaneously form a 
CVT, we see that V(F, 73 ( 1 )) < V{P,a 3 {I)) if 0.4364590141 < r < 0.4486234903, which yields 
the proposition. □ 

Let us now give the following theorem. 
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Theorem 5.2. Let n & N be such that n is not of the form 2^^"^ for any i{n) G N. Let 
an{I) he the set as defined in Section 3. Then, an{I) does not form an optimal CVT for 
0.4371985206 < r < 0.4384471872. 

Proof. By Proposition 15.11 we see that for 0.4364590141 < r < 0.4486234903, both 6n{I) and 
«„(/) form CVTs, and V{P,5n{I)) < V{P,an{I)). Thns, an{I) does not form an optimal CVT 
for 0.4371985206 < r < 0.4384471872, which is the theorem. □ 

Remark 5.3. Comparing Proposition 14.31 and Proposition 15.11 if n is not of the form 2^^"^^ for 
any positive integer i{n), we can say that if 0.4364590141 < r < 0.4384471872, then 

the CVT fin{h), which is obtained nsing the formnla given in |GL2] does not form an optimal 
CVT. The least npper bonnd of r for which /?„(/) forms an optimal CVT is still unknown. The 
investigation of it will appear elsewhere. 
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